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We establish methods for quantum state tomography based on compressed sensing. These methods are spe- 
cialized for quantum states that are fairly pure, and they offer a significant performance improvement on large 
quantum systems. In particular, they are able to reconstruct an unknown density matrix of dimension d and rank 
r using 0{rd\o^ d) measurement settings, compared to standard methods that require settings. Our meth- 
ods have several features that make them amenable to experimental implementation: they require only simple 
Pauli measurements, use fast convex optimization, are stable against noise, and can be applied to states that are 
only approximately low-rank. The acquired data can be used to certify that the state is indeed close to pure, so 
no a priori assumptions are needed. We present both theoretical bounds and numerical simulations. 



The tasks of reconstructing the quantum states and pro- 
cesses produced by physical systems — known respectively 
as quantum state and process tomography [1] — are of in- 
creasing importance in physics and especially in quantum in- 
formation science. Tomography has been used to characterize 
the quantum state of trapped ions [ ] and an optical entan- 
gling gate [3] among many other implementations. But a fun- 
damental difficulty in performing tomography on many-body 
systems is the exponential growth in the state space dimen- 
sion. For example, to get a maximum-likelihood estimate of 
a quantum state of 8 ions, Ref. [ ] required hundreds of thou- 
sands of measurements and weeks of post-processing. 

Still, one might hope to overcome this obstacle, because 
the vast majority of quantum states are not of physical interest. 
Rather, one is often interested in states with special properties: 
pure states, states with particular symmetries, ground states 
of local Hamiltonians, etc., and tomography might be more 
efficient in such special cases [ ]. 

In particular, consider pure or nearly pure quantum states, 
i.e., states with low entropy. More precisely, consider a quan- 
tum state that is essentially supported on an r-dimensional 
space, meaning the density matrix is close (in a given norm) 
to a matrix of rank r, where r is small. Such states arise in 
very common physical settings, e.g. a pure state subject to a 
local noise process [20]. 

A standard implementation of tomography [3, 6] would use 
<P or more measurement settings, where d ~ 2" for an n- 
qubit system. But a simple parameter counting argument sug- 
gests that 0{rd) settings could possibly suffice — a signif- 
icant improvement. However, it is not clear how to achieve 
this performance in practice, i.e., how to choose these mea- 
surements, or how to efficiently reconstruct the density ma- 
trix. For instance, the problem of finding a minimum-rank 
matrix subject to linear constraints is NP-hard in general [7]. 

In addition to a reduction in experimental complexity, one 
might hope that a post-processing algorithm which takes as 
input only 0{rd) <C d^ numbers could be tuned to run con- 
siderably faster than standard methods. Since the output of the 



procedure is a low-rank approximation to the density opera- 
tor and only requires 0{rd) numbers be specified, it becomes 
conceivable that the run time scales better than 0{d^), clearly 
impossible for naive approaches using dense matrices. 

In this Letter, we introduce a method to achieve such dras- 
tic reductions in measurement complexity, together with ef- 
ficient algorithms for post-processing. The approach further 
develops ideas that have recently been studied under the label 
of "compressed sensing". Compressed sensing [8] provides 
techniques for recovering a sparse vector from a small num- 
ber of measurements [9]. Here, sparsity means that this vector 
contains only a few non-zero entries in a specified basis, and 
the measurements are linear functions of its entries. When 
the measurements are chosen at random (in a certain precise 
sense), then with high probability two surprising things hap- 
pen: the vector is uniquely determined by a small number of 
measurements, and it can be recovered by an efficient convex 
optimization algorithm [8]. 

Matrix completion [10-12] is a generalization of com- 
pressed sensing from vectors to matrices. Here, one recov- 
ers certain "incoherent" low-rank matrices X from a small 
number of matrix elements Xj j. The problem of low-rank 
quantum state tomography bears a strong resemblance to ma- 
trix completion. However, there are important differences. 
We wish to use measurements that can be more easily im- 
plemented in an experiment than obtaining elements pij of 
density matrices. Previous results [10-12] cannot be applied 
to this more general situation. We would also like to avoid any 
unnatural incoherence assumptions crucial in prior work [10]. 

Our first result is a protocol for tomography that overcomes 
both of these difficulties: it uses Pauli measurements only, 
and it works for arbitrary density matrices. We prove that only 
0{rd log^ d) measurement settings suffice. What is more, our 
proof introduces some new techniques, which both generalize 
and vastly simplify the previous work on matrix completion. 
We sketch the proof here; a more complete version appears in 
[25]. This provides the basic theoretical justification for our 
method of doing tomography. 



2 



We then consider a number of practical issues. In a real 
experiment, the measurements are noisy, and the true state is 
only approximately low-rank. We show that our method is ro- 
bust to these sources of error We also describe ways to certify 
that a state is nearly pure without any a priori assumptions. 

Finally, we present fast algorithms for reconstructing the 
density matrix from the measurement statistics based on 
semidefinite programming - a feature not present in earlier 
methods for pure-state tomography [4-6]. These are adapted 
from algorithms for matrix completion [14], and they are 
much faster than standard interior-point solvers. Reconstruct- 
ing a low-rank density matrix for 8 qubits takes about one 
minute on an ordinary laptop computer 

While our methods do not overcome the exponential growth 
in measurement complexity (which is provably impossible for 
any protocol capable of handling generic pure states), they 
do significantly push the boundary of what can be done in a 
realistic setting. 

Our techniques also apply to process tomography: to 
characterize an unknown quantum process 8, prepare the 
Jamiolkowski state p£, and perform state tomography on pg. 
Our methods work when £ can approximately be written as a 
sum of only a few Kraus operators, because this implies that 
P£ has small rank. 

Matrix recovery using Pauli measurements. We consider 
the case of n spin- 1/2 systems in an unknown state p [16]. 
An n-qubit Pauli matrix is of the form w = {S)r=i where 
Wi G {1, (7^, cr", cr^}. There are (P such matrices, labeled 
'w{a),a £ [i-,cP]- The protocol proceeds as follows: choose 
m integers Ai, . . . , Am S [1, (P] at random and measure the 
expectation values tr pw{Ai). One then solves a convex opti- 
mization problem: minimize ||cr||tr [17] subject to 

tr(T = l, tTw{Ai)a = tTw{Ai)p. (!) 

Theorem 1 (Low-rank tomography) Let p be an arbitrary 
state of rank r. If m ^ crfr log^ d randomly chosen Pauli ex- 
pectations are known, then p can be uniquely reconstructed by 
solving the convex optimization problem (1) with probability 
of failure exponentially small in c. 

The proof is inspired by, but technically very different from, 
earlier work on matrix completion [10]. Our methods are 
more general, can be tuned to give tighter bounds, and are 
much more compact, allowing us to present a fairly complete 
argument in this Letter A more detailed presentation of this 
technique - covering the reconstruction of low-rank matrices 
from few expansion coefficients w.r.t. general operator bases 
(not just Pauli matrices or matrix elements) - will be pub- 
lished elsewhere [_,)]. 

Proof: Here we sketch the argument and explain the main 
ideas; detailed calculations are in the EPAPS supplement. 

Note that the linear constraints (1) depend only on the 
projection of p onto the span of the measured observables 
w{Ai), . . . , w{Am)- This is precisely the range of the "sam- 
pling operator" 7^ : p ^ ^^"^iw(AOtrpw(Ai). (Note 



that E[7?.(p)] = p.) Indeed, the convex program can be writ- 
ten as miucr \\a\\tr s.t. TZcr = TZp. Evidently, the solution is 
unique if for all deviations A := cr — p away from p either 
7^A^0or||p + A||tr > Ijplltr. 

We will ascertain this by using a basic idea from con- 
vex optimization: constructing a strict subgradient Y for the 
norm. A matrix F is a strict subgradient if ||p + A|jtr > 
llplltr + tr yA for all A 7^ 0. The main contribution below is 
a method for constructing such a Y which is also in the range 
of TZ. For then 7?.A = implies that A is orthogonal to the 
range of TZ, thus tr FA = and the subgradient condition 
reads ||p + A||tr > \\p\\ti- This implies uniqueness. (In fact, 
it is sufficient to approximate the subgradient condition in a 
certain sense). 

Let E be the projection onto the range of p, let T be the 
space spanned by those operators whose row or column space 
is contained in range p. Let Vt be the projection onto T, Vj^ 
onto the orthogonal complement. Decompose A = At+A;^^, 
the parts of A that lie in the subspaces T and T^. We distin- 
guish two cases: (i) ||At1|2 > rf^ll A-|^||2, and f/;j ||At||2 < 

d2||A^||2[17]. 

Case ( i) is easier. In this case, A is well-approximated by 
At and essentially we only have to show that the restric- 
tion A := Vt'TZT't of 7?. to T is invertible. Using a non- 
commutative large deviation bound (see EPAPS supplement), 

Vt[\\A - 1t\\ >t\< Adrer^^''/^ (2) 

where k = m/{dr) [17]. Hence the probability that \\A — 
1t\\ > ^ is smaller than 4dre~''/^^ =: pi. If that is not the 
case, one easily sees that ||7?.A||2 > 0, concluding the proof 
for this case. 

Case (ii) is more involved. A matrix Y G 
span(i(;(j4i), . . . , w{Am)) is an almost subgradient [18] if 

\\VtY -E\\2<l/{2d^), llP^rll < 1/2. (3) 

First, suppose such a Y exists. Then a simple calculation (see 
EPAPS) using the condition (ii) shows that 72.A = indeed 
implies \\p + A|jtr > ||p||tr as hinted at above. This proves 
uniqueness in case (ii). The difficult part consists in showing 
that an abnost-subgradient exists. 

To this end, we design a recursive process (the "golfing 
scheme" [2: ]) which converges to a subgradient exponentially 
fast. Assume we draw / batches of n^rd Pauli observables in- 
dependently at random (kq will be chosen later). Define re- 
cursively Xq — E, 

i 

Y, = J2 T^jXj-i' X,^E- VtY,, (4) 

Y = Yi. Let TZi be the sampling operator associated with the 
ith batch, and Ai its restriction to T. Assume that in each run 
\\Ai — It II 2 < 1/2. Denote the probability of this event not 
occurring by p2 . Then 

||X,||2 = \\X,^l-VTn,X^-lh 

= \\{1t ~ Ai}X,^l\\2 < l/2||X,_i||2, 
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so that ||Xi|l2 < 2-*||Xo|| = Hence, Y = Yi fulfills 

the first part of (3), as soon as I > \og2{2(P y/r) . We turn 
to the second part. Again using large-deviation techniques 
(EPAPS) we find \\Vj^n,X,^i\\ < 1/ {4^)\\X,_i\\2 with 
some (high) probability (1 — ps). Therefore: 

\\V^Y,\\ < ^||7'^7^,X,_l|| < ^£2-' < i (5) 

which is the second part of (3). 

Lastly, we have to bound the total probability of failure 
Pf < Pi + P2 + P3- Set kq = 64/i(l + \n{8dl)), which 
means that m = dr{\nd)^ 0(1) coefficients will be sampled 
in total. A simple calculation gives p/ < e~^. This completes 
the proof of our main result. □ 

In the remaining space, we address the important aspects 
of resilience against noise, certified tomography, and numeri- 
cal performance. Owing to space limitations, the presentation 
will focus on conceptual issues, with the details in [24]. 

Robustness to noise. Realistic situations will differ from 
the previous case in two regards. First, the true state pt may 
not be low-rank, but only well approximated by a state p of 
rank r: \\pt ~ p\\2 < £i- Second, due to systematic and statis- 
tical noise, the available estimates for the Pauli expectations 
are not exactly ti ptw (a), but of the form tiojuula) for some 
matrix w. Assume \\TZuj — TZpt\\2 < £2 (in practical situations, 
£2 may be estimated from the error bars associated with the in- 
dividual Pauli expectation values [21]). In order to get an esti- 
mate for Pt, choose some A > 1 and s > \{y/dFJm)ei + £2, 
and solve the convex program 

min||(7||tr , subject to llT^cr — 7?.cj|| 2 < £■ (6) 

Observation 1 (Robustness to noise) Let pt be an approxi- 
mately low-rank state as described above. Suppose m = 
cdr log^ d randomly chosen Pauli expectations are known up 
to an error of e as in (6), and let a* be the solution of (6). 
Then the difference \\a* — pt\\t-c is smaller than 0{eVrd). 
This holds with probability of failure at most plus the 
probability of failure in Theorem 1. 

The proof combines ideas from Ref. [12] with our argu- 
ment above [19]. The main difference from the noise-free 
case is that, instead of using tr FA = 0, we must now work 
with |trFA| < 2||y||2(5. With this estimate. Observation 
1 follows from the noise-free proof, together with some ele- 
mentary calculations (see EPAPS). We remark that the above 
bound is likely to be quite loose; based on related work in- 
volving the "restricted isometry property," we conjecture that 
the robustness to noise is actually substantially stronger than 
what is shown here [13]. 

Certified tomography of almost pure states. The preced- 
ing results require an a priori promise: that the true state pt 
is (5i -close to a rank-r state. However, when performing to- 
mography of an unknown state, neither r nor 61 are known 
beforehand. There are a few solutions to this quandary. First, 



r and may be estimated from other physical parameters of 
the system, such as the strength of the local noise [20]. 

Another approach is to estimate r and 5i from the same 
data that is used to reconstruct the state. When r = 1, this 
approach is particularly effective, in entirely assumption-free 
tomography: one can estimate 5i, using only 0{d) Pauli ex- 
pectation values. This is because 61 is related to the purity 
Tr p^, which has a simple closed-form expression in terms of 
Pauli expectation values. See EPAPS for details. We get: 

Observation 2 (Certified tomography) Assume that the un- 
known physical state is close to being pure. Then one can 
find a certificate for that assumption, and reconstruct the 
state with explicit guarantees on the reconstruction error, from 
0{cd\o^ d) Pauli expectation values. The probability of fail- 
ure is exponentially small in c. 

Finally, when the state is approximately low-rank but not 
nearly pure (r > 1), one may perform tomography using dif- 
ferent numbers of random Pauli expectation values m. When 
m is larger than necessary (corresponding to an over-estimate 
of r), we are guaranteed to find the correct density matrix. 
When m is too small, we find empirically that the algorithms 
for reconstructing the density matrix (i.e., solving the convex 
program (1)) simply fail to converge. 

A hybrid approach to matrix recovery. Here we describe 
a variant of our tomography method that makes the classi- 
cal post-processing step (i.e., solving the convex program (1) 
to reconstruct the density matrix) faster. This method also 
uses random Pauli measurements, but they are chosen in a 
structured way. Any Pauli matrix is of the form w^a, v) = 
{g))!^ii"'='"=(cr^)"'"(cr^)^^ foTu,v e {0,1}". We choose a 
random subset S C {0, 1}" of size 0{r polylog(d)), and then 
for all M € S and v e {0, 1}", measure the Pauli matrix 
w{u, v). We call this the "hybrid method" because it is equiv- 
alent to a certain structured matrix completion problem. This 
fact implies that certain key computations in solving the con- 
vex program (1) can be implemented in time 0{d) rather than 
0{d^) [14]. However, the hybrid method is not covered by 
the strong theoretical guarantees shown earlier, though it does 
give accurate results in practice. For a more complete discus- 
sion, see the EPAPS supplement. 

Numerical results. We numerically simulated both the 
random Pauli and hybrid approaches discussed above. 
For both approaches, we used singular value thresholding 
(SVT) [14]. Instead of direcdy solving Eq. (6), SVT mini- 
mizes T||(T||tr + ||o'||2/2 subject to I tr((T — uj)w{Ai)\ < S, 
which is a good proxy to Eq. (6) when r dominates the sec- 
ond term; the programs are equivalent in the limit t 00 
(provided Eq. (6) has a unique solution) [1 -r]. Estimating the 
second term for typical states suggests choosing 2rr ^ 1; we 
use T = 5. To simulate tomography, we chose a random state 
from the Haar measure on a d x r dimensional system and 
traced out the r-dimensional ancilla, then applied depolariz- 
ing noise of strength 7. We sampled expectation values as- 
sociated with randomly chosen operators as above, and added 
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FIG. 1: Average fidelity and trace distance vs. (scaled) number of 
measurement settings m for random states of n = 8 qubits, so d = 
2". As discussed in the text, the sampled states had rank r = 3, 
depolarizing noise of 5% and Gaussian statistical noise with a = 
0.1/d. Both the random Pauli and hybrid approaches are shown. 

additional statistical noise (respecting Hermiticity) which was 
i.i.d. Gaussian with variance and mean zero. We used SVT 
and quantified the quality of the reconstruction by the fidelity 
and the trace distance for various values of to, each averaged 
over 5 simulations. This dependence is shown in Fig. 1. The 
reconstruction is remarkably high fidelity, despite severe un- 
dersamphng and corruption by both depolarizing and statisti- 
cal noise [26]. Using the hybrid method with 8 qubits on a 
rank 3 state plus 7 = 5% depolarizing, and statistical noise 
strength ad = 0.1, we typically achieve 95% fidelity recon- 
structions in under 10 seconds on a modest laptop with 2 GB 
of RAM and a 2.2 GHz dual-core processor using MATLAB 
— even though 90% of the matrix elements remain unsam- 
pled. Increasing the number of samples only improves our 
accuracy and speed, so long as sparsity is maintained. 

Using truly randomly chosen Pauli observables (instead of 
the hybrid method) slightly increases the processing time due 
to the dense matrix multiplications involved: in our setup 
about one minute. However, this method achieves even bet- 
ter performance with respect to errors, as seen in Fig. 1. 

The simulations above show that our method work for 
generic low rank states. Lastly, we demonstrate the function- 
ing of the approach in the experimental context of the state 
p found in the 8 ion experiment of Ref. [s]. To exemplify 
the above results, we simulated physical measurements by 
sampling from the probability distribution computed using the 
Born rule applied to the reconstructed state p. This state is ap- 
proximately low-rank, with 99% of the weight concentrated 
on the first 11 eigenvectors. The standard deviation per ob- 
servable was 3/d. Fewer than 30% of all Pauli matrices were 
chosen randomly. From this information, a rank = 3 approx- 
imation <T with fidelity of 90.5% with respect to p was found 
in about 3 minutes on the aforementioned laptop. 

Discussion. We have presented new methods for low -rank 
quantum state tomography, which require only 0{rd log^ (d)) 
measurements, where r is the rank of the unknown density 



matrix and d is the Hilbert space dimension. Our methods are 
based on and further develop the new paradigm of compressed 
sensing, and in particular, matrix completion [10, 11]. We use 
measurements that are experimentally feasible, together with 
very fast classical post-processing. The methods perform well 
in practice, and are also supported by theoretical guarantees. 
It would be interesting to further flesh out the trade off be- 
tween the need for measurements that can be performed eas- 
ily in an experiment and the need for sparse matrices during 
the classical post-processing step. It is the hope that this work 
stimulates such further investigations. 
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APPENDIX 
Details of the proof of Theorem 1 

While this publication contains a complete proof of all the 
claims relevant for quantum tomography, the reader is invited 
to consult the more general and explicit presentation in Ref. 
[25] (and soon [24]). Below, we provide those details of the 
proof of Theorem 1 , which were left out in the main text. 

We introduce some more formal notations used in the ar- 
gument. Denote the trace inner product between two Her- 
mitian operators p,a by (p, cr) := ti pa. We assume that 
w{Ai), . . . ,w{Am) are independent, identically distributed 
matrix-valued random variables, with w{Ai) drawn from the 
(P Pauli matrices with uniform probability. Thus, we model 
the selection of the observables as a process of sampling with 
replacement. It is both very plausible and easily provable [27] 
that drawing the observables without replacement can only 
yield better results. 



Non-commutative large-deviation bound 

An essential tool for the proof is a non-commutative large- 
deviation bound from [ : ]. Let S = J^T -^i ^ °f 
matrix-valued random variables (r.v.'s) Xi. Then it is shown 
in [22] that for every A, t > we have 

Pr[i|5i| >t]< 2de-^*||E[e^^]||'". (7) 

It is simple to derive a Bernstein-type inequality from (7). In- 
deed, assume that Y is some operator-valued random variable 
with which is bounded in the sense that < 1 with prob- 
ability one and which has zero mean E[y] — 0. Recall the 
standard estimate 

1 + y < < 1 + y + y"^ 

valid for real numbers y e [—1,1] (actually a bit beyond). 
From the upper bound, we get < 1 + Y + . From the 
lower bound: 

]S[e^] < 1 + E[y2] < cxp(E[y2]) 
=> ||E[e^-]!|<||cxp(E[y2])||^cxp(||E[y2]||). (8) 

In order to apply (8) to (7), we set Y = XX. The parameter 
A is chosen to be A = t/{2ma'^), where cr^ = ||E[X2]||. A 
straight-forward calculation now gives 

Pr[||S'|| > t] < 2de-''/*""'\ (9) 
(valid for < < 2m(j'^ /\\X\\). 



"Case (i)": large-deviation bound 

The first application of (9) is to verify Eq. (2) from the main 
text, which claims that 

Pt[\\A - 1t\\ >t]< Adre-^"^^^. (10) 
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To this end, let Yi be the super-operator defined by 



-rTiwiA,)){w{A,),VTicT)). 



We will employ Eq. (9) on the rv.'s Xj = {Yi - E[y,]), where 
]S[Yi] = -^It- From the fact that x i— > is operator convex, 
one has = ||E[(y - Ey)2]|| < ||E[r2]||. To estimate 
the latter quantity, we bound (using Holder's inequality (c.f. 
[Bhatia, Matrix Analysis])) 



\'PTWa\\2 sup 

teT,||t||2=i 



iwa,tr<\\wa\mi 



< \\war2r\\t\\i < 2- 



and hence 



MY^] = —1E\{wa.Vtwa)Y] 
m 

m d 



which implies cr^ < The claimed Eq. (10) directly fol- 
lows by plugging this estimate of cr^ into the non-commutative 
large-deviation bound (9). 



"Case (ii)": large deviation bound 

The deviation bound before Eq. (5) of the main text follows 
again from (9). Let F be an arbitrary matrix in T. With Xi = 

^^1 



^r^iwiA,))tTw{A,)F: 



tr" - sup ^J2^{^^WaFf{^\ir^Waf 



VnIIV'II 



(15) 



having used that < 1 and that the {d ^^^Wa} form 

an orthonormal basis. Thus 

Pr[||■PJ^7^F|| > i||F||2] < 2de-*''=''/''. (16) 

In the proof, we use (16) for t = l/{Ay/r). Hence the proba- 
bility of failure becomes 

P3 < 2de~'Si. 



Details for Observation 1 



"Case ( ii) ": the approximate subgradient 

Next, consider the claim after Eq. (3) of the main 
text. There, we assumed that Y was a matrix in 
span(w(Ai), . . . , w{Am)) such that 

\\VtY -E\\2<l/{2d^), \\V^Y\\<l/2. (11) 

It is to be shown that 72.A = implies ||p + A||tr > i|p||tr- 

Recall the scalar sign function sign which maps positive 
numbers to +1, to and negative numbers to —1. If a is 
any Hermitian matrix, then sign a is the matrix resulting from 
applying the sign-function to the eigenvalues of a. Note that 

trcr = (signer, cr) (12) 

and recall Holder's inequaUty [23] 

(fTl,fT2) < llailltrlkall (13) 

for any two Hermitian ci, CT2- 

Letting F = sign we compute: 

llp + A|lt, > l|S(p + A)S||tr + l|(l-^)(p + A)(l-£;)|lt, 

> {E,p + E/\E) + {F,/\^) 

= !|pi|t, + (S,AT) + (F,A^)-(y,A) (14) 
= \\p\\t,+ {E-VTY,l\T) + {F -V^Y,l\^) 

> llplltr- 2^||AT||2 + i||A^||t,> llpiltr. 

(Use the "pinching inequality" [23] in the first step; (12), (13) 
in the second. The third step is (12) and using that TZA = 
and Y G range Y implies (Y, A) = 0. The last estimate uses 
(11) and, once more, (13)). 



In this subsection we need to assume that the Paulis are 
sampled without replacement. All previous bounds continue 
to hold — see remark above. Let 

I 

Q: - ^w(A,)Trpu.(A,) 

1=1 

be the projection operator onto range TZ, normalized so that 
IIQII = 1. Define 7 ^, and note that Q = -fTZ. The 
optimization program (6) of the main text becomes min 1 1 cr 1 1 tr^ 
s.t. \\Qa - QLd\\2 < !£■ 

Let A = cr — p. We upper-bound || QA||2 as follows. First, 

IIQAII2 < ||Q(a-C^)||2+ ||Q(C^-Pt)||2+ ||Q(pt-p)||2. 

For any feasible cr, the first term is bounded by 76, while the 
second term is bounded by 7^2 ■ For the third term, note that 
for the fixed matrix — p, E[|| Q{pt — p)!!!] = 7l|Pt ^ Plli' 
by Markov's inequality, ||Q(/3t — p)\W < A^7||pt — pUj, with 
probability at least 1 — [-"]■ Thus we have 

IIQAII2 < 7£ + 7e2 + AV7ei < 276 25 

(where we defined 5 = 72). 

On the other hand, we can also lower-bound || QAj|2 as fol- 
lows: IIQAII2 > IIQArib - ||QA^||2. For the second term, 
we have ||QA;^||2 < ||A^j|2 (we cannot use Markov's in- 
equality, because here we require a bound that holds simulta- 
neously for all A). For the first term, recall from the noise-free 
case that A = VtTZVt satisfies || It - < 1/2 with high 
probability, and hence we have ||QAt||2 > 7||-4At||2 > 
i7||AT||2- So we have 

||QA||2> i7l|AT||2-||A^||2. 
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Combining the above two inequalities and rearranging, we get 

||Ar||2< ^(25+||A^||2) < ^(2<5+||A^||tO. (17) 

We now show that \\p + A||tr < \\p\\ti impHes that A must 
be small. With the estimate (17) at our disposal, we re-visit 
(14): 

l|P+A||t, - llplltr 

> {E, At) + (F, A^) - (r, A) + (F, A) 

= {E- Vt(Y),At) + {F- VUy),A^) + (y, Q(A)) 

> -^||AT||2 + ^||A^||tr-25||F||2 

- ■ ^^^^ + + Iw^rWtr nrh 

= (i_±)||A^||t,.-2<5(i^ + ||y||2). 

We use a crude bound ||r||2 = ||'Pt(>")||2 + ll'Pr (^)ll2 < 
\\rTiY)-E\\2 + \\Eh + \\r^iY)\\Vd<^ + y^+^Vd. 
Then, for reasonable values of the parameters (say d > 16, 
m > 16, r < d/10), we have 

\\p + A\U,- \\p\\t,> ^WA^Wtr- 2SVd. 

So + A||tr < llplltr implies 

llA^Ilt, < fSVd. (18) 

Finally, write II A||tr < \/2r|| At||2 + || A^|| tr, and use (17) 
and (18). After simplifying, substituting in 6 = je, and set- 
ting K = m/ [rd), one obtains 

||A||tr < 6eVf +13e\/^+5e-^ < 0{e^/Vd). (19) 

Finally, we write ||cr - pt||tr < ||cr - p||tr + ||p - Pt||ti- 
The first term is bounded by 0{e\/rd) as shown above; the 
second term is < ||p — pt||2V^ < SiVd < eVd. This gives 
the desired result. 

Certified tomography for almost-pure states 

For almost-pure states (r = 1), it is possible to obtain es- 
timates for Si from only 0{d) Pauli expectation values with- 
out any assumptions. In this subsection, we sketch a simple 
scheme based on this observation: it outputs a reconstructed 
density matrix a, together with a certified bound on the devia- 
tion ||ct — Ptiltr- The algorithm takes two inputs: 0{d log^ d) 
random Pauli expectation values, and the experimentalist's es- 
timate of the measurement precision S2 [21]. 

Concretely, we set r = 1 and aim to put a bound on 
^1 = ||pt — IV") (^'1 II2, where |?/') is the eigenvector of pt cor- 
responding to the largest eigenvalue. Such a bound can be 
obtained in terms of the purity ti pj = UptHi- E.g., 

= Hp, -|V;)(^|||2< 21/2(1 _||p,||2) (20) 



(valid for ||pf|| > 1/2, which can certifiably be tested). Es- 
timating the purity is done in a way analogous to the proof 
of Theorem 1. Choose m i.i.d. random variables Ai taking 
values in [1, d^], and define 5 = (d/m) X^i" 1 I tr 
Then ElS"] = ||a;||^ and thus ||pt||2 > ¥.[S]^/^ - 62- We 
can bound the deviation of S from its expected value by the 
standard (commutative) Chernoff bound. One finds for the 
variance Var(((i/TO)| trw(A)a;p) < ((i/m2)||w||2 < d/m^, 
so that (fori e [0, 1]): 

Pr[|S'- ||a;||2| > t] < 2e-*'"/(4''\ 

Choose TO = Apd/t^ for some /i > 1 to ensure that 

Vv[\S - IIaII^I > t + 262 + 5l] < e-r (21) 

Combining the previous equation with (20), we have arrived 
at a certified estimate for 5i. 



A hybrid approach to matrix recovery 

Matrix recovery using Pauli measurements does lack one 
desirable feature: the classical post-processing (solving the 
convex programs) is more costly, compared to matrix comple- 
tion [10, II]. This is due to the role of sparse linear algebra in 
the SVT (singular value thresholding) algorithm [14]. The ba- 
sic issue is that SVT must handle matrices of the form TZp. For 
matrix completion, TZp is sparse, so basic operations such as 
matrix- vector multiplication take time 0{d) \ but when we use 
random Pauli measurements, Tl{p) is dense, and basic opera- 
tions take time 0{d^). We now describe a "hybrid" approach 
that avoids this difficulty, and works well in practice. The 
main observation is that for certain, carefully selected sets of 
Pauli matrices, TZp is sparse after all. 

Any Pauli matrix is of the form 

n 

fe=i 

for u,v £ {0, 1}". Plainly, the position of the d non- 
zero matrix elements of w{u, v) depends only on u (v en- 
codes only phase information). Now choose a random subset 
S C {0, 1}" of size 0(r polylog(d)), and then for a\\ u e S 
and V £ {0, 1}", measure the Pauli matrix w{u, v). Thus we 
are measuring each of the Pauli strings containing only cr^ or 
identity, together with these same strings "masked" by apply- 
ing a set of size |5| of Pauli strings with a pattern of cr^ and 
identity. Formally, this means 

TZp cc w{u,v)tT{pw{u,v)). 
«es,ue{o.i}" 

It follows that TZp is sparse with only l^ld non-zero matrix 
elements. This "hybrid method" can be viewed as a variant of 
the usual matrix completion problem, where instead of sam- 
pling matrix elements independently at random, we sample 
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groups of matrix elements determined by the random strings 

u&S. 

While the hybrid algorithm works well for generic states, 
certain input states p may fail to be "incoherent enough" w.rt. 
the very specific set expectation values obtained (c.f. [ 1 1), II]). 
For example, when the eigenvectors of p are nearly aligned 
with the standard basis, most of the matrix elements of p are 
nearly 0, and hence matrix completion is impossible. To avoid 
this problem, we suggest to perform a pseudo-random unitary 
U prior to measuring the Pauli matrices. One then uses the 
hybrid method on U pU\ and finally applies to recover 
p. In particular, one can draw U at random from an (approxi- 
mate) unitary fc-design with k ^ n/ log n. Explicit construc- 
tions of such unitaries are known, and can be implemented 
efficiently [15]. 

While we cannot at this point prove rigorous guarantees for 
the hybrid approach, we do show below that randomization 
by approximate fc-designs generates sufficient "incoherence" 
that the original matrix completion algorithms [ 1 0, 1 1 ] would 
work. Because these algorithms call for matrix elements to 
be sampled from a uniform distribution. Observation 3 does 
not rigorously apply to the hybrid scheme. It does, however, 
make it plausible that pseudo-randomization overcomes inco- 
herence problems and that guarantees for the hybrid method 
can be proven in the future. 

Observation 3 (Incoherence from /s-designs) Let p be an 

arbitrary state of rank r and dimension d, and let E be the 
projector onto the support of p. Let i = 1, . . . ,d, de- 
note the standard basis. Let U be drawn at random from 
an (e-approximate) unitary k-design with k n/logn (and 
e ~ andlet \bi) = U\i). Then, with probability at least 

1 — (1/d), the following holds: 

for all i = 1, . . . ,d, ||i?|6i)||2 — /^o^/^i 

where fiQ ~ Ci(log(i)'^^, and Ci and C2 are fixed constants. 

This implies the incoherence conditions (AO) and (Al) of 
[10], specialized to the case of positive semidefinite matrices, 
with ^0 as given above and pi — po\/r. Combining with the 
results of [ 1 0] shows that ordinary matrix completion, with 



matrix elements sampled independently at random, will suc- 
ceed. This guarantee does not extend to the hybrid method, 
however 

Proof of Observation 3: First consider a single vector |6i), 
and define Z = ||i?|&i)||2. We will compute the k'th moment 
of Z; 



= Tr(£; ~ ■ 



E[|6i)(6i|®'=]£;®'=). 



We want to compute E[|6i)(6i| 
random unit vector in (C*, and let 



Let be a Haar- 



A = E[|6i)(6i|®^-]-E[|'Ui)K|^'=]. 
By the definition of an approximate unitary fc-design, every 
matrix element of A has absolute value at most e/d'^. Thus 
||A||2 < s. A well-known (c.f. e.g. Def. 2.1 in [28]) corol- 
lary of Schur's Lemma states E[|ui)(ui|®''] = ns/dim(S'), 
where S is the symmetric subspace of ((D'^)®'^, Us is the pro- 
jector onto S, and dim(5) = {''^l^^) ■ So we have 



E[|6i)(6i 



Substituting in, we get: 



dim(S') 



A. 



< 



+ 

dim(iS') 



< 



dim(S') 
r^k\ 



(d + fc - 1) . 



Tr£;®'^'A 

i + |li?«'=|l2|lA|l2 



i—r ^rk\^ 
— + eVr'^ < — 
■a \ d 



Using Markov's inequality, and setting t = {rk / d) ■ d^/'^ < 
(r/d) ■ poly(log d), we get 



Pr[Z >t]< 



E[Z 



frk\^ 
- Vtd, 



1 



This proves the claim for a single vector Now take the 
union bound over all the vectors |6i), i = 1, . . . , d. □ 



